Formal Grammars for Linguistic Treebank Queries

نویسندگان

  • Mark Dras
  • Steve Cassidy
چکیده

There has been recent interest in looking at what is required for a tree query language for linguistic corpora. One approach is to start from existing formal machinery, such as tree grammars and automata, to see what kind of machine is an appropriate underlying one for the query language. The goal of the paper is then to examine what is an appropriate machine for a linguistic tree query language, with a view to future work defining a query language based on it. In this paper we review work relating XPath to regular tree grammars, and as the paper’s first contribution show how regular tree grammars can also be a basis for extensions proposed for XPath for common linguistic corpus querying. As the paper’s second contribution we demonstrate that, on the other hand, regular tree grammars cannot describe a number of structures of interest; we then show that, instead, a slightly more powerful machine is appropriate, and indicate how linguistic tree query languages might be augmented to include this extra power.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexicons and grammars for language processing : industrial or handcrafted products ?

During the recent years, the use of linguistic data for language processing (semantic ambiguity resolution, translation...) increased progressively. Such data are now commonly called language resources. A few years ago, nearly all the language resources used for this purpose were collections of texts as the Brown Corpus and the Penn Treebank, but the use of electronic lexicons (WordNet, FrameNe...

متن کامل

Linguistically Motivated Parallel Parsebanks

Parallel grammars and parallel treebanks can be a useful method for studying linguistic diversity and commonality. We use this approach to study how arguments to similar predicates are realized across languages. To that end, we formulate formal principles for aligning at phrase and word levels based on translational correspondences at predicate-argument level. A first version of a new tool for ...

متن کامل

Closing the Gap Between Stochastic and Rule-based LFG Grammars

Developing large-scale deep grammars in a constraint-based framework such as Lexical Functional Grammar (LFG) is time-consuming and requires significant linguistic insight. Recently, treebank-based constraint-grammar acquisition approaches have been developed as an alternative to hand-crafting such resources. While treebank-based approaches are wide coverage and robust and achieve competitive e...

متن کامل

Treebank vs. Xbar-based Automatic F-structure Annotation Treebank vs. Xbar-based Automatic F-structure Annotation

Manual, large scale (computational) grammar development is time consuming, expensive and requires lots of linguistic expertise. More recently, a number of alternatives based on treebank resources (such as Penn-II, Susanne, AP treebank) have been explored. The idea is to automatically \induce" or rather read oo (P)CFG grammars from the parse annotated treebank resources and to use the treebank g...

متن کامل

From Linguistic Theory to Syntactic Analysis: Corpus-Oriented Grammar Development and Feature Forest Model

The goal of this thesis is to establish a system for the automatic syntactic analysis of real-world text. Syntactic analysis in this thesis denotes computation of in-depth syntactic structures that are grounded in syntactic theories like Head-Driven Phrase Structure Grammar (HPSG). Since syntactic structures provide essential components for computing meanings of natural language sentences, the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005